Statistical constraints on climate model parameters using a scalable cloud-based inference framework

James Carzon; Bruno Abreu; Leighton Regayre; Kenneth Carslaw; Lucia Deaconu; Philip Stier; Hamish Gordon; Mikael Kuusela

doi:10.1017/eds.2023.12

Statistical constraints on climate model parameters using a scalable cloud-based inference framework

Published online by Cambridge University Press: 05 July 2023

Hamish Gordon and

James Carzon*: Affiliation:
Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
Bruno Abreu: Affiliation:
National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, Urbana-Champaign, IL, USA
Leighton Regayre: Affiliation:
Institute for Climate and Atmospheric Science, School of Earth and Environment, University of Leeds, Leeds, United Kingdom Met Office Hadley Centre, Exeter, United Kingdom
Kenneth Carslaw: Affiliation:
Institute for Climate and Atmospheric Science, School of Earth and Environment, University of Leeds, Leeds, United Kingdom
Lucia Deaconu: Affiliation:
Atmospheric, Oceanic and Planetary Physics Department, University of Oxford, Oxford, United Kingdom Faculty of Environmental Science and Engineering, Babes-Bolyai University, Cluj, Romania
Philip Stier: Affiliation:
Atmospheric, Oceanic and Planetary Physics Department, University of Oxford, Oxford, United Kingdom
Hamish Gordon: Affiliation:
Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA Center for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA, USA
Mikael Kuusela: Affiliation:
Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA NSF AI Planning Institute for Data-Driven Discovery in Physics, Carnegie Mellon University, Pittsburgh, PA, USA
*: Corresponding author: James Carzon; Email: jcarzon@andrew.cmu.edu

Article contents

Abstract
Impact Statement
Introduction
Inference Framework
Results
Conclusion
Author contribution
Competing interest
Data availability statement
Funding statement
Provenance statement
Supplementary material
Footnotes
References

Abstract

Atmospheric aerosols influence the Earth’s climate, primarily by affecting cloud formation and scattering visible radiation. However, aerosol-related physical processes in climate simulations are highly uncertain. Constraining these processes could help improve model-based climate predictions. We propose a scalable statistical framework for constraining the parameters of expensive climate models by comparing model outputs with observations. Using the C3.AI Suite, a cloud computing platform, we use a perturbed parameter ensemble of the UKESM1 climate model to efficiently train a surrogate model. A method for estimating a data-driven model discrepancy term is described. The strict bounds method is applied to quantify parametric uncertainty in a principled way. We demonstrate the scalability of this framework with 2 weeks’ worth of simulated aerosol optical depth data over the South Atlantic and Central African region, written from the model every 3 hr and matched in time to twice-daily MODIS satellite observations. When constraining the model using real satellite observations, we establish constraints on combinations of two model parameters using much higher time-resolution outputs from the climate model than previous studies. This result suggests that within the limits imposed by an imperfect climate model, potentially very powerful constraints may be achieved when our framework is scaled to the analysis of more observations and for longer time periods.

Keywords

gaussian process emulator inverse problem model discrepancy perturbed parameter ensemble strict bounds UKESM1

Type: Methods Paper
Information: Environmental Data Science , Volume 2 , 2023 , e24

DOI: https://doi.org/10.1017/eds.2023.12 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices: Open materials
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Impact Statement

Atmospheric aerosols influence the amount of solar radiation reflected by Earth, but the magnitude of the effect is highly uncertain, and this is one of the key reasons why climate predictions are highly uncertain. We propose a framework for reducing uncertainty in aerosol effects on radiation by comparing simulations from complex climate models to satellite observations. This framework uses parallel computing and statistical theory to ensure efficient computations and valid inferences.

1. Introduction

Atmospheric aerosols affect the formation of clouds and scatter and absorb visible radiation, thereby influencing Earth’s climate. Improved estimates of the change in the total effect of aerosols on the climate over the industrial era—a highly uncertain quantity termed the aerosol effective radiative forcing (ERF)—have the potential to reduce uncertainty in the sensitivity of the climate to these aerosols. Recent efforts to constrain ERF have involved first reducing uncertainty in the distributions of more basic aerosol-related physical parameters and then studying the effects of these constraints on ERF. This has been notably done by comparing simulated data with observations collected on various global aircraft and ship campaigns as well as from satellites and ground stations (Johnson et al., Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2020, Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2018; Regayre et al., Reference Regayre, Schmale, Johnson, Tatzelt, Baccarini, Henning, Yoshioka, Stratmann, Gysel-Beer, Grosvenor and Carslaw2020, Reference Regayre, Johnson, Yoshioka, Pringle, Sexton, Booth, Lee, Bellouin and Carslaw2018). Toward constraining the aerosol radiative forcing, Regayre et al. (Reference Regayre, Deaconu, Grosvenor, Sexton, Symonds, Langton, Watson-Parris, Mulcahy, Pringle, Richardson, Johnson, Rostron, Gordon, Lister, Stier and Carslaw2023) employ simulated outputs from the UKESM1 climate model that are averaged over month-long periods. In contrast, the present work employs three-hourly simulation outputs from the same model, requiring an inferential framework capable of handling this increase in the resolution and quantity of the data.

Establishing constraints on the input parameters of expensive computer models by comparing their outputs with observational data is an area of active research (Biegler et al., Reference Biegler, Biros, Ghattas, Heinkenschloss, Keyes, Mallick, Marzouk, Tenorio, van Bloemen Waanders and Willcox2010). It is often imperative that a surrogate model be trained from an ensemble of model input–output pairs and used in place of the simulator to ensure tractable computations, but this step contributes an additional source of uncertainty. If the outputs, or surrogate outputs, do not match observations within some tolerance, then those parameter values are deemed implausible. In Johnson et al. (Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2020) and Regayre et al. (Reference Regayre, Schmale, Johnson, Tatzelt, Baccarini, Henning, Yoshioka, Stratmann, Gysel-Beer, Grosvenor and Carslaw2020), the method of history matching is used, a technique from oil reservoir engineering that has been adapted to the evaluation of computer models more generally in recent decades (Verly et al., Reference Verly, David, Journel and Marechel1984; Craig et al., Reference Craig, Goldstein, Scheult, Smith, Gatsonis, Hodges, Kass, McCulloch, Rossi and Singpurwalla1997; Johansen, Reference Johansen2008; Bower et al., Reference Bower, Goldstein and Vernon2010). However, the aim of history matching is to constrain parameter spaces and not necessarily to provide well-understood probabilistic guarantees on those constraints.

In contrast with previous work on constraining climate model parameters, our framework draws on a recent surge of interest in simulator-based inference (Schafer and Stark, Reference Schafer and Stark2009; Cranmer et al., Reference Cranmer, Brehmer and Louppe2020; Dalmasso et al., Reference Dalmasso, Masserano, Zhao, Izbicki and Lee2023) to produce parameter constraints that provide rigorous statistical guarantees of frequentist coverage. Specifically, our work deals with a special case of simulator-based inference where the observations are given by a deterministic simulator and an additive noise model. Patil et al. (Reference Patil, Kuusela and Hobbs2022) and Stanley et al. (Reference Stanley, Patil and Kuusela2022) use a strict bounds method (Stark, Reference Stark1992) to construct efficient confidence sets for the model parameters in closely related inverse problems in remote sensing and high energy physics. Unlike in these works where the forward models of interest are linear and known exactly, the present problem features a forward model (UKESM1) which is nonlinear and estimated using an emulator. We take advantage of the strict bounds method while inverting the emulated forward model numerically and accounting for emulation uncertainty.

Our framework also offers a novel means of accounting for the systematic disagreement, known as the model discrepancy, between a simulator and the physical system which it purports to model. A number of approaches to accounting for model discrepancy in computer model calibration or simulator-based inference have been developed (Kennedy and O’Hagan, Reference Kennedy and O’Hagan2001; Higdon et al., Reference Higdon, Gattiker, Williams and Rightley2008; McNeall et al., Reference McNeall, Williams, Booth, Betts, Challenor, Wiltshire and Sexton2016). We propose a new data-driven procedure for incorporating model discrepancy (and other sources of error that we cannot separately quantify, such as representation errors; Schutgens et al. (Reference Schutgens, Tsyro, Gryspeerdt, Goto, Weigum, Schulz and Stier2017)) into the strict bounds inversion framework. Cloud-based computing resources are leveraged to make each step in the framework computationally scalable.

1.1. Data sources

Aerosol optical depth (AOD) is a measure of the amount of aerosol in the atmosphere. It is measured by MODerate-resolution Imaging Spectroradiometer (MODIS) which is found on board the NASA-launched Terra and Aqua satellites that offer near-global coverage twice daily and provide easily readable open-access data sets. In the flow chart in Figure 1, this data set is the SatelliteTimeseries. For the present application, we focus on MODIS retrievals from the South Atlantic and Central African region over July 1–14, 2017. This domain and period of study are selected for its known biomass burning-related atmospheric aerosol activity.

Figure 1. The flow chart for our pipeline for building frequentist confidence sets on climate model parameters. After matching both the satellite observation and model output grids, five steps of processing follow. The EmulatorTraining and EmulatorEvaluation pipes provide scalability to the framework by leveraging parallel computing in these most expensive steps. The DataDrivenModelDiscrep, PlausibilityTest, and FrequentistConfSet pipes implement the strict bounds-based method to ensure principled uncertainty quantification.

We compare these data with climate model outputs taken from the UKESM1 model (Sellar et al., Reference Sellar, Jones, Mulcahy, Tang, Yool, Wiltshire, O’Connor, Stringer, Hill, Palmieri, Woodward, de Mora, Kuhlbrodt, Rumbold, Kelley, Ellis, Johnson, Walton, Abraham, Andrews, Andrews, Archibald, Berthou, Burke, Blockley, Carslaw, Dalvi, Edwards, Folberth, Gedney, Griffiths, Harper, Hendry, Hewitt, Johnson, Jones, Jones, Keeble, Liddicoat, Morgenstern, Parker, Predoi, Robertson, Siahaan, Smith, Swaminathan, Woodhouse, Zeng and Zerroukat2019). We use simulations that were nudged to the observed meteorology following the method of Telford et al. (Reference Telford, Braesicke, Morgenstern and Pyle2008), and therefore, the simulated weather conditions will be sufficiently realistic that we can examine the ability of the model to represent the observed aerosols. We use the perturbed parameter ensemble (PPE) of 221 atmosphere-only simulations documented by Regayre et al. (Reference Regayre, Deaconu, Grosvenor, Sexton, Symonds, Langton, Watson-Parris, Mulcahy, Pringle, Richardson, Johnson, Rostron, Gordon, Lister, Stier and Carslaw2023), which have a configuration that closely matches the atmosphere component of the UKESM1 model used in the CMIP6 experiments (Sellar et al., Reference Sellar, Jones, Mulcahy, Tang, Yool, Wiltshire, O’Connor, Stringer, Hill, Palmieri, Woodward, de Mora, Kuhlbrodt, Rumbold, Kelley, Ellis, Johnson, Walton, Abraham, Andrews, Andrews, Archibald, Berthou, Burke, Blockley, Carslaw, Dalvi, Edwards, Folberth, Gedney, Griffiths, Harper, Hendry, Hewitt, Johnson, Jones, Jones, Keeble, Liddicoat, Morgenstern, Parker, Predoi, Robertson, Siahaan, Smith, Swaminathan, Woodhouse, Zeng and Zerroukat2019). For each ensemble member, let $ u $ be a vector of parameter inputs that determine climatic aerosol-related processes of practical importance, and let $ x $ be a vector of control variables that define the specific output of the climate model, denoted $ \eta \left(x,u\right) $ , representing the climate observable $ \zeta (x) $ . In our setting, $ x $ denotes a latitude–longitude-time triple in the climate model output’s spatiotemporal grid, denoted $ {\mathcal{M}}_{\mathrm{sim}} $ and called the SimulatedGrid in Figure 1. The parameters $ u $ are listed in Table 1. The ensemble is a set of simulations, in notation

$$ {D}_{\mathrm{train}}=\left\{\left(x,{u}^j,\eta \left(x,{u}^j\right)\right):x\in {\mathcal{M}}_{\mathrm{sim}},j=1,2,\dots, 221\right\}. $$

Table 1. Seventeen of the 37 UKESM1 parameters (Regayre et al., Reference Regayre, Deaconu, Grosvenor, Sexton, Symonds, Langton, Watson-Parris, Mulcahy, Pringle, Richardson, Johnson, Rostron, Gordon, Lister, Stier and Carslaw2023) used to build the surrogate model, selected based on relevance for predicting AOD

Note: $ {N}_d $ is the cloud droplet number concentration. SF is short for scale factor.

For reference, this and some later notation used throughout this paper are summarized in Table 2.

Table 2. A notational reference table

1.2. Problem setup

In order to emulate the model output for unobserved parameter values, we assume as in Johnson et al. (Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2020) that at $ x\in {\mathcal{M}}_{\mathrm{sim}} $ , the model output is a realization of a Gaussian process,

(1)

$$ {\tilde{\eta}}_x(u)\sim \mathcal{GP}\left[{m}_x\left(\cdot \right),{k}_x\left(\cdot, \cdot \right)\right]\hskip3em \left(u\in {\mathrm{\mathbb{R}}}^p\right). $$

For each $ x $ , we train the surrogate model $ {\tilde{\eta}}_x $ as described in Section 2.1.

Let $ z(x) $ denote the AOD observations and $ {u}^{\ast } $ the true parameter value. Assuming that the emulated climate model is unbiased, these observations and parameters are related according to the equations

$$ z(x)=\zeta (x)+{\varepsilon}_{\mathrm{meas},x}=\unicode{x1D53C}\hskip0.35em \left[{\tilde{\eta}}_x\left({u}^{\ast}\right)|{D}_{\mathrm{train}}\right]+{\varepsilon}_{\mathrm{emu},x}\left({u}^{\ast}\right)+{\varepsilon}_{\mathrm{meas},x}+{\varepsilon}_{\mathrm{other},x}. $$

The different sources of variance in the measurements—namely, the measurement uncertainty ( $ {\varepsilon}_{\mathrm{meas},x} $ ), the emulation uncertainty ( $ {\varepsilon}_{\mathrm{emu},x} $ ), and any other sources ( $ {\varepsilon}_{\mathrm{other},x} $ ) which are not analyzed uniquely, including uncertainty due to model discrepancy and error in representativeness of measurements—are assumed to be mean zero and independent across $ x $ . This is a simplifying assumption in that, in reality, these terms might be correlated across $ x $ . By further assuming that $ {\varepsilon}_{\mathrm{meas},x} $ and $ {\varepsilon}_{\mathrm{other},x} $ are Gaussian, the observations $ z(x) $ are jointly normally distributed across the $ x $ on the spatiotemporal grid $ {\mathcal{M}}_{\mathrm{sat}} $ (SatelliteGrid) on which the MODIS retrievals are gridded. In particular,

(2)

$$ z={\left(z(x)\right)}_{x\in {\mathcal{M}}_{\mathrm{sat}}}\sim N\left(\unicode{x1D53C}\left[{\left({\tilde{\eta}}_x\left({u}^{\ast}\right)\right)}_{x\in {\mathcal{M}}_{\mathrm{sat}}}|{D}_{\mathrm{train}}\right],{\Sigma}_{\mathrm{meas}}+{\Sigma}_{\mathrm{emu}}+{\delta}^2{I}_{\mid {\mathcal{M}}_{\mathrm{sat}}\mid}\right), $$

where $ {\Sigma}_{\mathrm{meas}} $ and $ {\Sigma}_{\mathrm{emu}} $ are covariance matrices such that entry $ {\Sigma}_{\mathrm{meas},i,j}=\mathrm{Var}\left({\varepsilon}_{\mathrm{meas},{x}^i}\right) $ is the measurement uncertainty at location $ {x}^i $ when $ i=j $ , otherwise zero (i.e., it is a diagonal matrix and the measurement errors are assumed to be uncorrelated between locations); $ {\Sigma}_{\mathrm{emu},i,j}=\mathrm{Var}\left({\varepsilon}_{\mathrm{emu},{x}^i}\left({u}^{\ast}\right)\right) $ is the emulation uncertainty when $ i=j $ , otherwise zero; and $ {\delta}^2=\mathrm{Var}\left({\varepsilon}_{\mathrm{other},\mathrm{x}}\right) $ is a homoscedastic variance term standing in for all other unaccounted-for errors. The matrix $ {I}_{\mid {\mathcal{M}}_{\mathrm{sat}}\mid } $ is an identity matrix with its number of rows equal to the size of the set $ {\mathcal{M}}_{\mathrm{sat}} $ . The disagreement between grids $ {\mathcal{M}}_{\mathrm{sim}} $ and $ {\mathcal{M}}_{\mathrm{sat}} $ is addressed in the following section. The $ {\Sigma}_{\mathrm{emu},i,i} $ are modeled by the surrogate model (see Section 2.1). The $ {\Sigma}_{\mathrm{meas},i,i} $ are the published MODIS uncertainties, which may not account for all possible problems with the retrievals, but these unaccounted-for uncertainties will be captured by $ {\delta}^2 $ , which is estimated from the observations (see Section 2.4).

2. Inference Framework

We used the C3.AI Suite, a cloud computing platform for data analytics workflows deployed to Microsoft Azure infrastructure (C3 Enterprise AI, n.d.). The platform combines databases, open-source packages, and proprietary machine-learning workflows optimized for working with large-scale, data-intensive applications. We built new data structures and methods for processing NETCDF4 files containing high-dimensional time-series datasets. We also developed a scalable inference pipeline for training and predicting through several thousands of Gaussian process models using asynchronous processing such as parallel batch and map-reduce jobs. This pipeline is summarized in Figure 1 and complements other recently published workflows for similar tasks (e.g., Watson-Parris et al., Reference Watson-Parris, Williams, Deaconu and Stier2021).

The grid of satellite measurements is finer than the grid of simulations. The raw MODIS retrievals are pre-processed algorithmically by the MODIS team before they are provided on a $ 1{}^{\circ}\times 1{}^{\circ } $ spatial grid in the Level-3 Global Gridded Atmosphere Product (Hubank et al., Reference Hubank, Platnick, King and Ridgway2020). However, the model outputs are on a $ 1.35{}^{\circ}\times 1.875{}^{\circ } $ grid. To reconcile these differences, we match simulated grid cells to their nearest neighbor on the Level-3 MODIS product grid in space and time. Differences are computed on the resulting $ 1.35{}^{\circ}\times 1.875{}^{\circ } $ resolution MatchedGrid, denoted $ \mathcal{M} $ .

2.1. Emulate: Inside the EmulatorTraining pipe

As indicated in Eq. (1), we assume the climate model is a realization of a Gaussian process where for each $ x\in \mathcal{M} $ the mean and anisotropic exponential covariance functions are

$$ {m}_x(u)={\beta}_{0,x},\hskip2em {k}_x(u,{u}^{\mathrm{\prime}})=\hskip1.5pt {\beta}_{1,x}^2\mathrm{exp}\left(-\sqrt{\sum \limits_{i=1}^p\frac{{({u}_i-{u}_i^{\mathrm{\prime}})}^2}{\ell_{i,x}^2}},\hskip2pt \right). $$

We make this choice for $ {k}_x $ for the reason that a relatively rough process appears reasonable in this problem. We place an anisotropy assumption on the model by fitting a different length scale parameter $ {\mathrm{\ell}}_{i,x} $ for each model parameter $ {u}_i $ . By fitting $ {\mathrm{\ell}}_{i,x} $ separately for each $ x $ , we let the parameters’ effects on the model output vary geographically.

We train the emulating Gaussian processes by estimating their parameters on $ {D}_{\mathrm{train}} $ using maximum likelihood (Rasmussen and Williams, Reference Rasmussen and Williams2006) with the scikit-learn Python library and L-BFGS-B optimization algorithm, and thus we obtain a collection of models $ {\tilde{\eta}}_x $ , $ x\in \mathcal{M} $ . Figure 2 illustrates the quality of these emulators by verifying that the emulated AOD varies with respect to active parameters exactly as the training data set would suggest. In terms of scalability, an ordinary Gaussian process model on the entire ModelTimeseries would compute in $ O\left({\left(|\mathcal{M}|n\right)}^3\right) $ time, $ n $ being the number of members in the PPE, whereas the EmulatorTraining pipe trains in $ O\left(|\mathcal{M}|{n}^3\right) $ time. This routine is parallelized by distributing batches of training jobs to independent worker nodes on the C3.AI Suite cluster, making the physical wait time much shorter.

Figure 2. Sample curves of the emulated response $ \unicode{x1D53C}\left[{\tilde{\eta}}_x(u)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right] $ averaged over two MODIS observing times on July 1, 2017 for two locations $ x $ . (Left) Red gridpoints are missing MODIS AOD retrievals. Green gridpoints are ruled out as outliers per Section 2.2. (Top right) The scattered points are from $ {D}_{\mathrm{train}} $ , and the 221 curves are slices of the trained emulator response surface where all of the parameters are fixed to their training values from $ {D}_{\mathrm{train}} $ except the parameter labeling the $ x $ axis of each subplot, which is varied within its range given in Table 1. Near $ \left(0{}^{\circ},20{}^{\circ}\right) $ , AOD decreases as the accumulation dry deposition rate increases. The average MODIS measurement is given by the dashed line. (Bottom right) At $ \left(-20{}^{\circ},-20{}^{\circ}\right) $ , emulated AOD responds positively to the sea spray emission flux.

2.2. Predict: Inside the EmulatorEvaluation pipe

We uniformly sample within the ranges given in Table 1 a collection of 5,000 new parameter vectors $ {u}^k $ (in contrast with 221 training vectors) to obtain a testing sample

$$ {D}_{\mathrm{test}}=\left\{\left(x,{u}^k,\unicode{x1D53C}\left[{\tilde{\eta}}_x\left({u}^k\right)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right],\mathrm{Var}\left[{\tilde{\eta}}_x\left({u}^k\right)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right]\right):x\in \mathcal{M},k=1,2,\dots, 5000\right\}. $$

This set pairs points in the spatiotemporal-parametric space with the emulated mean and variance of the AOD response, specifying a distribution closely mimicking the model response surface $ \eta \left(x,u\right) $ . We expect that this number of sampling points $ {u}^k $ is sufficient to reliably constrain a small number of parameters. These predictions are performed efficiently by a map-reduce job across the collection of models $ {\tilde{\eta}}_x $ .

The surrogate model appears to perform well at most locations in the spatio-temporal domain. However, gross discrepancies between the observations and the model output arise in a small fraction of the grid points which cannot be accounted for using our model discrepancy term (described later). In particular, when we consider the distance metric

$$ {J}_{x,\delta}\left({u}^k\right)=\frac{\mid \unicode{x1D53C}\left[{\tilde{\eta}}_x\left({u}^k\right)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right]-z(x)\mid }{\sqrt{\mathrm{Var}\left[{\tilde{\eta}}_x\left({u}^k\right)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right]+\mathrm{Var}\left[{\varepsilon}_{\mathrm{meas},x}\right]+{\gamma}^2}}, $$

we identify about 2% of the grid points for which $ {J}_{x,\delta}\left({u}^k\right) $ is a visible outlier for all $ {u}^k $ , $ k=1,2,\dots, \mathrm{5,000} $ . The parameter $ \gamma $ here was tuned visually based on QQ plots since the other variance parameter $ \delta $ is yet unestimated. Let $ {\mathcal{M}}^{\ast } $ be the remaining coordinates in the spatiotemporal grid that have not been excluded either as outliers or due to missingness.

2.3. Discrepancy: Inside the DataDrivenModelDiscrep pipe

The quantity $ {\delta}^2 $ as seen in (2) is the variance of the unaccounted-for uncertainty $ {\varepsilon}_{\mathrm{other},x} $ in our model, which we estimate using maximum likelihood. We write down the likelihood for unknowns $ u $ and $ {\delta}^2 $ from our Gaussian assumption,

$$ L\left(u,{\delta}^2;{D}_{\mathrm{train}}\right)=\prod \limits_{x\in {\mathcal{M}}^{\ast }}\frac{1}{\sqrt{2\pi }}{\left({\sigma}_{x,\mathrm{emu}}^2(u)+{\sigma}_{\mathrm{meas},x}^2+{\delta}^2\right)}^{-\frac{1}{2}}\exp \left\{-\frac{1}{2}\frac{{\left[{\hat{\eta}}_x(u)-z(x)\right]}^2}{\sigma_{x,\mathrm{emu}}^2(u)+{\sigma}_{\mathrm{meas},x}^2+{\delta}^2}\right\}, $$

where

$$ {\sigma}_{x,\mathrm{emu}}^2(u)=\mathrm{Var}\left[{\tilde{\eta}}_x(u)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right],\hskip1em {\sigma}_{\mathrm{meas},x}^2=\mathrm{Var}\left[{\varepsilon}_{\mathrm{meas},x}\right],\hskip1em {\hat{\eta}}_x(u)=\unicode{x1D53C}\left[{\tilde{\eta}}_x(u)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right]. $$

To numerically obtain the maximum likelihood estimate for $ {\delta}^2 $ , we compute the maximizing value $ {\hat{\delta}}_k^2 $ of $ \log L $ for each of the test parameters $ {u}^k $ using the scipy Python library’s implementation of Brent’s algorithm (Press et al., Reference Press, Teukolsky, Vetterling and Flannery1992); then among these maximizing values we select the one which gives the overall maximum likelihood over $ k=1,2,\dots, \mathrm{5,000} $ . The resulting estimate $ {\hat{\delta}}_{\mathrm{MLE}}^2={\hat{\delta}}_{\hat{k}}^2 $ , where $ \hat{k}=\arg \hskip0.1em {\max}_k\log L\left({u}^k,{\hat{\delta}}_k^2;{D}_{\mathrm{train}}\right) $ , is an approximate estimator, where the approximation is due to the search over the parameter space being finite.

This part of the pipeline runs quickly. Evaluating the expression for the likelihood and performing the optimization routine took about 5 min in real time in our case and can be performed in local memory. Our value for the estimator on the remaining data was $ {\hat{\delta}}_{\mathrm{MLE}}^2=0.025 $ , which is of similar magnitude as the average measurement variance of $ {\overline{\sigma}}_{\mathrm{meas}}^2=0.027 $ and emulation variance of $ {\overline{\sigma}}_{\mathrm{emu}}^2\left(\hat{u}\right)=0.038 $ .

2.4. Test: Inside the PlausibilityTest pipe

To obtain a confidence set for the underlying atmospheric parameters, we perform a test for parameter plausibility using MODIS AOD observations on the MatchedGrid. Following the terminology used by Johnson et al. (Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2020), we write down the implausibility metric

$$ I(u)=\sqrt{\sum \limits_{x\in {\mathcal{M}}^{\ast }}{\left(\frac{\unicode{x1D53C}\left[{\tilde{\eta}}_x(u)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right]-z(x)}{\sqrt{\mathrm{Var}\left[{\tilde{\eta}}_x(u)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right]+\mathrm{Var}\left[{\varepsilon}_{\mathrm{meas},x}\right]+{\hat{\delta}}_{\mathrm{MLE}}^2}}\right)}^2}. $$

For each $ u $ in $ {D}_{\mathrm{test}} $ , we compare our observed implausibility measure against its approximate distribution under the null hypothesis that $ u $ is the correct parameter, $ {H}_0:I(u)\sim \sqrt{\chi^2\left( df=|{\mathcal{M}}^{\ast }|\right)} $ . Here we use the facts that the sum of $ n $ squared independent standard Gaussian random variables is distributed as $ {\chi}^2\left( df=n\right) $ and that $ n $ is assumed to be large enough that we can ignore the variation in the model-discrepancy variance estimate $ {\hat{\delta}}_{\mathrm{MLE}}^2 $ . In other words, testing at the 0.05 significance level, a parameter vector $ u $ will be deemed implausible if $ I(u) $ exceeds the $ 95 $ th percentile of the above null distribution.

A method for obtaining confidence sets inspired by the application of history matching in Johnson et al. (Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2020) can also be derived. We find that inference based on this method is sensitive to the choice of tolerance level that is explicit in their choice of implausibility statistic. For a discussion of how our test differs from this instance of history matching, see the Appendix in the Supplementary material.

2.5. Infer: Inside the FrequentistConfSet pipe

Having obtained a collection of test results for each $ {u}^k $ , $ k=1,2,\dots, \mathrm{5,000} $ , we approximate the Neyman inversion (Dalmasso et al., Reference Dalmasso, Masserano, Zhao, Izbicki and Lee2023) of the test for plausibility described above by retaining all those parameters $ {u}^k $ for which we do not reject the null at 0.05 significance level to obtain an approximate $ 95\% $ confidence set.Footnote ¹ This is the region of the 17-dimensional parameter space which exclusively contains non-implausible parameter vectors. A two-dimensional projection of this set is on the right of Figure 3.

Figure 3. Parameter constraints at 95% confidence level. (a–c) One-dimensional projections of the FrequentistConfSet described in Section 2.5. The 95th percentile of the approximate null distribution $ {H}_0 $ is indicated by the horizontal red lines. The sea spray emission flux parameter appears to be on the verge of being constrained on its own from only 2 weeks of data. (d) The space spanned by the BVOC SOA and accumulation mode dry deposition rate parameters is binned, and the color of each bin shows the proportion of plausible parameter values inside. Dark purple indicates a proportion of zero—evidently, the lower right corner of this space is ruled out as implausible.

3. Results

Using our strict bounds-based test for parameter plausibility, we obtain a simultaneous 95% confidence set on the selected climate model parameters. We find that large values of the sea spray emission flux parameter are on the verge of being constrained with just 2 weeks of AOD data, shown in the upper left panel of Figure 3. However, the two plots on the bottom left show that for no level of either BVOC SOA or the accumulation mode dry deposition rate does our test for plausibility always fail, and so formal constraints on these cannot be obtained. To illustrate this point, suppose that the value of the accumulation dry deposition rate parameter (bottom left of Figure 3) was wrongly set to 1 when the true value is 10. Then there are combinations of the other 16 selected parameters that enable us to fit the model outputs to the observed AOD data within the modeled uncertainties. Hence we cannot rule out value 1 for the accumulation dry deposition rate parameter, and likewise for the other values for each parameter. If evaluated at a lower confidence level, our results seem broadly consistent with Regayre et al. (Reference Regayre, Deaconu, Grosvenor, Sexton, Symonds, Langton, Watson-Parris, Mulcahy, Pringle, Richardson, Johnson, Rostron, Gordon, Lister, Stier and Carslaw2023), who constrain the dry deposition rate toward large values, and Regayre et al. (Reference Regayre, Schmale, Johnson, Tatzelt, Baccarini, Henning, Yoshioka, Stratmann, Gysel-Beer, Grosvenor and Carslaw2020) and Johnson et al. (Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2020), who also constrain the sea spray emission parameter.

We are able to obtain a constraint at 95% confidence level on the combination of the deposition rate of dry aerosols in the accumulation mode and biogenic secondary organic aerosol from volatile organic compounds. See the right panel of Figure 3 for a binned projection of the resulting confidence set onto their span. The resulting constraint can be understood as physically meaning the following: If one hypothesizes that there are a lot of aerosols emitted by vegetation while the deposition rate is low for relatively large particles, then one will overestimate AOD so much that even when controlling for the uncertainty of the MODIS retrieval estimates due to instrumental error, the imperfection of our surrogate model, and any other sources of model discrepancy that we estimate for our climate model, a significant region of the subspace of these parameters can be ruled as implausible at the 95% confidence level.

4. Conclusion

To the best of our knowledge, this is the first use of simulated AOD from a climate model at a time resolution as high as three-hourly to obtain observation-based constraints on input parameters likely to regulate AOD. That a salient and meaningful constraint has been gleaned from just 2 weeks’ worth of data is suggestive of promising uses of high time resolution data in the future. Our framework is well suited for problem settings where a perturbed parameter ensemble is available for one’s climate simulator and where Gaussian process emulation is appropriate. Notably, any unquantified sources of uncertainty in this setting are accounted for by the data-driven model discrepancy built into the presented pipeline, an aspect that differs from other recent scalable frameworks for model calibration, such as ESEm (Watson-Parris et al., Reference Watson-Parris, Williams, Deaconu and Stier2021). Our approach assumes that the model discrepancy can be captured by an additive Gaussian error that is independent across space and time, so our constraints rely on these assumptions being at least approximately satisfied. A fundamental difference between our approach and the previous history matching approach as done by Johnson et al. (Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2020) is that our method provides frequentist confidence sets with well-defined probabilistic guarantees. In addition, as described in the Appendix in the Supplementary material, our method has a potential advantage over Johnson et al. (Reference Johnson, Regayre, Yoshioka, Pringle, Lee, Sexton, Rostron, Booth and Carslaw2020) in that the latter is sensitive to the tuning of its tolerance level. The computational cost of our pipeline is dominated by the Gaussian process computations, so the approach is computationally feasible as long as enough parallel processing resources are available for training and evaluating the pixel-wise emulators.

There are limitations to what can be achieved: with an imperfect and over-parameterized model, constraints can become inconsistent, or different parameter combinations can yield the same simulated ERF (Lee et al., Reference Lee, Reddington and Carslaw2016). However, employing more quantities for which we have observational and simulated data would nonetheless allow us to constrain a larger number of these parameters and get the most stringent constraints on aerosol radiative forcing we can. Other observable atmospheric quantities such as sulfates or organic carbon are sensitive to different sets of atmospheric parameters than those to which aerosol optical depth is sensitive, yielding potential opportunities to further constrain the parameter space using larger, more diverse observational data sets.

Acknowledgements

We are grateful to the two anonymous referees for the insightful comments we received through the peer review process for this manuscript. We also thank the Carnegie Mellon University Statistical Methods for the Physical Sciences (STAMPS) Research Group for helpful discussions and feedback at various stages throughout this work.

Author contribution

Conceptualization: H.G., M.K.; Formal analysis: J.C.; Methodology: J.C., M.K., K.C., L.R., H.G.; Climate simulations: L.R., K.C., L.D., P.S.; Analysis software: B.A., J.C.; Writing–original draft: J.C. Writing–review and editing: J.C., M.K., H.G., L.R., P.S.

Competing interest

The authors declare that no competing interests exist.

Data availability statement

The MODIS AOD measurements supporting the results of this work can be accessed through the MODIS web page: https://modis.gsfc.nasa.gov/data/dataprod/mod04.php (last access: 17 January 2023). Output from the A-CURE PPE is available on the CEDA archive (Regayre et al., Reference Regayre, Deaconu, Grosvenor, Sexton, Symonds, Langton, Watson-Parris, Mulcahy, Pringle, Richardson, Johnson, Rostron, Gordon, Lister, Stier and Carslaw2023). The code to reproduce results from this paper can be found on GitHub: https://github.com/c3aidti/smoke/tree/main/climateInformatics2023 (last version: 31 March 2023).

Funding statement

J.C., M.K., and H.G. acknowledge grant funding from the C3.AI Digital Transformation Institute. M.K. was also supported in part by NSF grants DMS-2053804 and PHY-2020295, and H.G. by NASA Grant No. 80NSSC21K1344. L.R. and K.S. acknowledge funding from NERC under grants AEROS, ACID-PRUF, GASSP, and A-CURE (NE/G006172/1, NE/I020059/1, NE/J024252/1, and NE/P013406/1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Provenance statement

This article is part of the Climate Informatics 2023 proceedings and was accepted in Environmental Data Science on the basis of the Climate Informatics peer review process.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/eds.2023.12.

Footnotes

This research article was awarded an Open Materials badge for transparent practices. See the Data Availability Statement for details.

¹ Note that the repetition of this singular hypothesis test is for inversion purposes. Since we are not inverting more than one distinct test, we do not face multiple testing issues of controlling Type I error.

References

Biegler, L, Biros, G, Ghattas, O, Heinkenschloss, M, Keyes, D, Mallick, B, Marzouk, Y, Tenorio, L, van Bloemen Waanders, B and Willcox, K (2010) Large-Scale Inverse Problems and Quantification of Uncertainty. Hoboken, NJ: John Wiley & Sons.CrossRef Google Scholar

Bower, RG, Goldstein, M and Vernon, I (2010) Galaxy formation: A Bayesian uncertainty analysis. Bayesian Analysis 5(4), 619–670.CrossRef Google Scholar

C3 Enterprise AI (n.d.). Available at https://c3.ai/ (accessed 13 December 2022).Google Scholar

Craig, PS, Goldstein, M, Scheult, AH and Smith, JA (1997) Pressure matching for hydrocarbon reservoirs: A case study in the use of Bayes linear strategies for large computer experiments. In Gatsonis, C, Hodges, JS, Kass, RE, McCulloch, R, Rossi, P and Singpurwalla, ND (eds), Case Studies in Bayesian Statistics. New York, NY: Springer.Google Scholar

Cranmer, K, Brehmer, J and Louppe, G (2020) The Frontier of simulation-based inference. Proceedings of the National Academy of Sciences of the United States of America 117(48), 30055–30062.CrossRef Google Scholar PubMed

Dalmasso, N, Masserano, L, Zhao, D, Izbicki, R and Lee, AB (2023) Likelihood-free frequentist inference: Confidence sets with correct conditional coverage. Preprint, arXiv:2107.03920.Google Scholar

Higdon, D, Gattiker, J, Williams, B and Rightley, M (2008) Computer model calibration using high-dimensional output. Journal of the American Statistical Association 103(482), 570–583.CrossRef Google Scholar

Hubank, P, Platnick, S, King, M and Ridgway, B (2020) MODIS atmosphere L3 gridded product algorithm theoretical basis document (ATBD) & users guide. Available at https://atmosphere-imager.gsfc.nasa.gov/sites/default/files/ModAtmo/documents/L3_ATBD_C6_C61_2020_08_06.pdf.Google Scholar

Johansen, K (2008) Statistical methods for history matching. PhD thesis, Technical University of Denmark, Kongens Lyngby, Denmark.Google Scholar

Johnson, JS, Regayre, LA, Yoshioka, M, Pringle, KJ, Lee, LA, Sexton, DMH, Rostron, JW, Booth, BBB and Carslaw, KS (2018) The importance of comprehensive parameter sampling and multiple observations for robust constraint of aerosol radiative forcing. Atmospheric Chemistry and Physics 18(17), 13031–13053.CrossRef Google Scholar

Johnson, JS, Regayre, LA, Yoshioka, M, Pringle, KJ, Lee, LA, Sexton, DMH, Rostron, JW, Booth, BBB, and Carslaw, KS (2020) Robust observational constraint of uncertain aerosol processes and emissions in a climate model and the effect on aerosol radiative forcing. Atmospheric Chemistry and Physics 20(15), 9491–9524.CrossRef Google Scholar

Kennedy, MC and O’Hagan, A (2001) Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(3), 425–464.CrossRef Google Scholar

Lee, LA, Reddington, CL and Carslaw, KS (2016) On the relationship between aerosol model uncertainty and radiative forcing uncertainty. Proceedings of the National Academy of Sciences of the United States of America 113(21), 5820–5827.CrossRef Google Scholar PubMed

McNeall, D, Williams, J, Booth, B, Betts, R, Challenor, P, Wiltshire, A and Sexton, D (2016) The impact of structural error on parameter constraint in a climate model. Earth System Dynamics 7(4), 917–935.CrossRef Google Scholar

Patil, P, Kuusela, M and Hobbs, J (2022) Objective frequentist uncertainty quantification for atmospheric CO₂ retrievals. SIAM/ASA Journal on Uncertainty Quantification 10(3), 827–859.CrossRef Google Scholar

Press, W, Teukolsky, SA, Vetterling, WT and Flannery, BP (1992) Numerical Recipes in C. Cambridge: Cambridge University Press.Google Scholar

Rasmussen, CE and Williams, CKI (2006) Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.Google Scholar

Regayre, LA, Deaconu, L, Grosvenor, DP, Sexton, DMH, Symonds, C, Langton, T, Watson-Parris, D, Mulcahy, JP, Pringle, KJ, Richardson, M, Johnson, JS, Rostron, JW, Gordon, H, Lister, G, Stier, P and Carslaw, KS (2023) Identifying climate model structural inconsistencies allows for tight constraint of aerosol radiative forcing, EGUsphere. Preprint. https://egusphere.copernicus.org/preprints/2023/egusphere-2023-77/.CrossRef Google Scholar

Regayre, LA, Johnson, JS, Yoshioka, M, Pringle, KJ, Sexton, DMH, Booth, BBB, Lee, LA, Bellouin, N and Carslaw, KS (2018) Aerosol and physical atmosphere model parameters are both important sources of uncertainty in aerosol ERF. Atmospheric Chemistry and Physics 18(13), 9975–10006.CrossRef Google Scholar

Regayre, LA, Schmale, J, Johnson, JS, Tatzelt, C, Baccarini, A, Henning, S, Yoshioka, M, Stratmann, F, Gysel-Beer, M, Grosvenor, DP and Carslaw, KS (2020) The value of remote marine aerosol measurements for constraining radiative forcing uncertainty. Atmospheric Chemistry and Physics 20(16), 10063–10072.CrossRef Google Scholar

Schafer, CM and Stark, PB (2009) Constructing confidence regions of optimal expected size. Journal of the American Statistical Association 104(487), 1080–1089.CrossRef Google Scholar

Schutgens, N, Tsyro, S, Gryspeerdt, E, Goto, D, Weigum, N, Schulz, M and Stier, P (2017) On the spatio-temporal representativeness of observations. Atmospheric Chemistry and Physics 17(16), 9761–9780.CrossRef Google Scholar

Sellar, AA, Jones, CG, Mulcahy, JP, Tang, Y, Yool, A, Wiltshire, A, O’Connor, FM, Stringer, M, Hill, R, Palmieri, J, Woodward, S, de Mora, L, Kuhlbrodt, T, Rumbold, ST, Kelley, DI, Ellis, R, Johnson, CE, Walton, J, Abraham, NL, Andrews, MB, Andrews, T, Archibald, AT, Berthou, S, Burke, E, Blockley, E, Carslaw, K, Dalvi, M, Edwards, J, Folberth, GA, Gedney, N, Griffiths, PT, Harper, AB, Hendry, MA, Hewitt, AJ, Johnson, B, Jones, A, Jones, CD, Keeble, J, Liddicoat, S, Morgenstern, O, Parker, RJ, Predoi, V, Robertson, E, Siahaan, A, Smith, RS, Swaminathan, R, Woodhouse, MT, Zeng, G and Zerroukat, M (2019) UKESM1: Description and evaluation of the U.K. EarthSystem model. Journal of Advances in Modeling Earth Systems 11, 4513–4558. https://doi.org/10.1029/2019MS001739 CrossRef Google Scholar

Stanley, M, Patil, P and Kuusela, M (2022) Uncertainty quantification for wide-bin unfolding: One-at-a-time strict bounds and prior-optimized confidence intervals. Journal of Instrumentation 17(10), P10013.CrossRef Google Scholar

Stark, PB (1992) Inference in infinite-dimensional inverse problems: Discretization and duality. Journal of Geophysical Research 97(B10), 14055–14082.CrossRef Google Scholar

Telford, PJ, Braesicke, P, Morgenstern, O and Pyle, JA (2008) Technical note: Description and assessment of a nudged version of the new dynamics unified model. Atmospheric Chemistry and Physics 8(6), 1701–1712.CrossRef Google Scholar

Verly, G, David, M, Journel, AG, Marechel, A (1984) Geostatistics for Natural Resources Characterization, Part 2. Dordrecht: D. Reidel Publishing Company.Google Scholar

Watson-Parris, D, Williams, A, Deaconu, L and Stier, P (2021) Model calibration using ESEm v1.1.0—An open, scalable earth system emulator. Geoscientific Model Developemnt 14(12), 7659–7672.CrossRef Google Scholar

Table 1. Seventeen of the 37 UKESM1 parameters (Regayre et al., 2023) used to build the surrogate model, selected based on relevance for predicting AOD

Table 2. A notational reference table

Figure 2. Sample curves of the emulated response $ \unicode{x1D53C}\left[{\tilde{\eta}}_x(u)\hskip0.1em |\hskip0.1em {D}_{\mathrm{train}}\right] $ averaged over two MODIS observing times on July 1, 2017 for two locations $ x $. (Left) Red gridpoints are missing MODIS AOD retrievals. Green gridpoints are ruled out as outliers per Section 2.2. (Top right) The scattered points are from $ {D}_{\mathrm{train}} $, and the 221 curves are slices of the trained emulator response surface where all of the parameters are fixed to their training values from $ {D}_{\mathrm{train}} $ except the parameter labeling the $ x $ axis of each subplot, which is varied within its range given in Table 1. Near $ \left(0{}^{\circ},20{}^{\circ}\right) $, AOD decreases as the accumulation dry deposition rate increases. The average MODIS measurement is given by the dashed line. (Bottom right) At $ \left(-20{}^{\circ},-20{}^{\circ}\right) $, emulated AOD responds positively to the sea spray emission flux.